-
Notifications
You must be signed in to change notification settings - Fork 289
Implement open ai api llm backend #3238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement open ai api llm backend #3238
Conversation
|
It would be useful to see an example of how a user selects this back-end in their runtime config. It might also be useful to add some discussion to the PR of the tradeoffs you considered when bundling this in with the existing |
|
The PR is going with bundling the open-ai api backend into the existing
@itowlson, I just realised in my PR main comment I said "breaking it", apologies, I meant "not breaking it" 😄 |
crates/factor-llm/src/spin.rs
Outdated
| pub struct RemoteHttpCompute { | ||
| url: Url, | ||
| auth_token: String, | ||
| custom_llm: Option<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want this to be an enum which would allow us to constrain it to only the options implemented.
|
|
||
| tracing::info!("Sending remote inference request to {chat_url}"); | ||
|
|
||
| let body = CreateChatCompletionRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use the generate API instead of Chat completions here because our interface does not really lend itself to having multiple roles. The wit interface aligns more closely with the generate API instead of chat completions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have looked into this option; the endpoint is /v1/responses. While it is more compatible with the current wit interface, I noticed an unusual behaviour with this endpoint while testing it.
For example, it didn't work with gpt-4. I got this error
"error": {
"message": "The model `gpt-4` does not exist or you do not have access to it.",
"type": "invalid_request_error",
"param": null,
"code": "model_not_found"
}But when I tried using it with gpt-4o, it worked. Got a different error message and rightly so 🫣
"error": {
"message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.",
"type": "insufficient_quota",
"param": null,
"code": "insufficient_quota"
}I tried with Ollama using the gpt-oss:20b model, got this error:
404 page not foundDived deep and realised Ollama doesn't have that endpoint. However, Ollama has the /generate endpoint, but not compatible with what we need here. Returns this:
{
"model": "gpt-oss:20b",
"created_at": "2025-08-28T21:17:25.432385Z",
"response": "",
"done": true,
"done_reason": "load"
}In conclusion, I think the current setup is fine, the User role matches the expected behaviour, similar to what the default remote LLM does.
crates/factor-llm/src/spin.rs
Outdated
| url: Url, | ||
| auth_token: String, | ||
| #[serde(default)] | ||
| custom_llm: CustomLlm, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably calling it backend might make more sense than custom_llm as this refers to the backing API.
crates/llm-remote-http/src/lib.rs
Outdated
|
|
||
| #[derive(Debug, Default, serde::Deserialize, PartialEq)] | ||
| #[serde(rename_all = "snake_case")] | ||
| pub enum CustomLlm { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably needs a better name. Something similar to the backend or something along those lines.
dicej
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I don't have much LLM API experience, so I can't comment on the details of how this uses the OpenAI APIs, but the structure of the code seems reasonable to me.
Is there any way we can automate testing this without necessarily spending money on an OpenAI account for CI, and without writing a mock API server from scratch (which might not be very useful anyway)?
|
Thanks for the review @dicej. There could be a way. Would look into that. On the other hand, because the API is compatible with other similar LLM APIs, such as Ollama, and it's free. |
e61e83a to
ccc7b34
Compare
crates/llm-remote-http/src/lib.rs
Outdated
| struct CreateChatCompletionResponse { | ||
| /// A unique identifier for the chat completion. | ||
| #[serde(rename = "id")] | ||
| _id: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why we're deserialising these presumably unused fields? I see you'd need them with deny_unknown_fields but that doesn't seem to be turned on...?
(This seems to be a general pattern throughout. If we need it then no worries, just looks odd to me.)
crates/llm-remote-http/src/lib.rs
Outdated
| } | ||
|
|
||
| #[derive(Deserialize)] | ||
| struct OpenAIEmbeddingUsage { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the name, this seems specific to the OpenAI backend, which makes me wonder if CreateEmbeddingResponse is also OpenAI-specific. I'd expect OpenAI stuff to be in open_ai.rs but I'm not sure I'm understanding this right.
| user: None, | ||
| }; | ||
|
|
||
| let chat_url = self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chat_url seems a misleading name for this.
|
|
||
| let chat_url = self | ||
| .url | ||
| .join("/v1/chat/completions") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This string appears in multiple places: should we pull it out to a const?
|
|
||
| let chat_url = self | ||
| .url | ||
| .join("/v1/embeddings") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, consider creating a const
| #[derive(serde::Deserialize)] | ||
| #[serde(untagged)] | ||
| enum CreateChatCompletionResponses { | ||
| Success(CreateChatCompletionResponse), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reinforces my concern about the location of CreateChatCompletionResponse
|
|
||
| #[derive(serde::Deserialize)] | ||
| #[serde(untagged)] | ||
| enum CreateChatCompletionResponses { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confused by this name. It appears to contain only one response?
|
|
||
| #[derive(serde::Deserialize)] | ||
| #[serde(untagged)] | ||
| enum CreateEmbeddingResponses { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again puzzled by the plural
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is it the schema for? Is it OpenAI-specific? All these message types seem a bit randomly scattered across the files: could we clarify the rhyme and reason, e.g. so maintainers know where to put new types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is OpenAI-specific schemas (I'm probably using the word incorrectly).
I thought having it inside of open_ai.rs might just make the file a bit crowded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I'd suggest:
- Move the file into a new
open_aifolder - Add
mod schemas;to the top ofopen_ai.rs - Change references in
open_ai.rsfromcrate::schemasto justschemas
This will make the schemas module a sub-module of the open_ai module which is a nice pattern for private stuff which would be too clutterful in the main file of the module.
ccc7b34 to
78faf6b
Compare
| @@ -0,0 +1,266 @@ | |||
| #![allow(dead_code)] | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the dead code? Why is it needed? If it is neede3d, could we limit this declaration to specific parts of the module rather than the whole thing (so that we don't lose checking on the whole module)?
| } | ||
|
|
||
| #[derive(Serialize, Debug)] | ||
| struct CreateChatCompletionRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still unclear on when I should put message types in this file vs when I should put them in schemas.rs.
| #[derive(Deserialize)] | ||
| struct CreateChatCompletionResponse { | ||
| /// A unique identifier for the chat completion. | ||
| id: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused. Is this why you allowed dead code? If we can't just omit the unused fields, then your previous strategy was right; but I would have thought we could omit them e.g.
struct CCCR {
choices: ...,
usage: ...,
}
But if that doesn't work then the _id prefix kludge is better than the dead code kludge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deserialization would fail if these fields are not provided, which is why I can't ignore them.
I'd revert to the underscore prefix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why deserialisation would fail. By default, if there's no Rust field that matches a JSON field, the JSON field is ignored. You need to explicitly opt into the "fail on unknown JSON" behaviour, because you never know when the server will add a new field; and as far as I can tell you haven't done that. (You had me doubting myself but I tested it: https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=14e024bd7b123e568dd368bc31dcbe95)
I feel like I'm missing something here - could you clarify the nature of the failure? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're absolutely right, apologies.
I was confusing it with some issues I encountered earlier in this PR.
78faf6b to
6ce4fec
Compare
|
|
||
| let url = self | ||
| .url | ||
| .join("EMBEDDINGS_ENDPOINT") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .join("EMBEDDINGS_ENDPOINT") | |
| .join(EMBEDDINGS_ENDPOINT) |
This raises some concerns about testing for me: I can't see how this could have worked.
I appreciate automated testing is hard for a service like this but can you confirm that you've tested the current code manually (both OpenAI and the changes to classic) and maybe capture what tests you performed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching that.
For testing, I manually test using the example provided in this PR [./examples/open-ai-rust]. It only tests for inference, which I still used to test before the last commit.
For embedded, I also used it but modified it slightly to ensure it worked. Used Ollama locally (their API is modelled closely to OpenAI's).
itowlson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM apart from that typo: once that's fixed and you've fully re-run your tests, this should be good to go.
6ce4fec to
b8f9973
Compare
| impl From<CreateChatCompletionResponse> for wasi_llm::InferencingResult { | ||
| fn from(value: CreateChatCompletionResponse) -> Self { | ||
| Self { | ||
| text: value.choices[0].message.content.clone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to defend against choices being empty? Or is it guaranteed to have an element?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it isn't explicitly stated in the documents that there will be at least one, it is expected to always have at least one if it's successful.
I suppose it costs nothing to add a safeguard against such an event.
While preserving the default http client, it introduces an OpenAI client type which also supports APIs similar to OpenAI's specs. Also includes an example Signed-off-by: Aminu Oluwaseun Joshua <[email protected]>
b8f9973 to
4226ea7
Compare
|
I am late here, but to answer
We could technically run ollama with a small model to verify in CI integration tests but whether it is worth it, I am not sure. |
This PR aims to resolve #3211. Remote LLM implementation now provides options: the default option and OpenAI.
The default represents the current option, and as the name states, it's the default in case none is provided.
It's expandable while preserving the original API and not breaking it.
Concerns arise regarding
wasi::llmparameters that are not compatible with OpenAI's API documentation, which may necessitate adjustments to the current definitions available in wit.. Documented here[Update]
In reference to @rylev's comment there would be no need for a change in the wit file, just going to use what's available.